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PREFACE 

This  paper  is  aimed  at  developing  tools  to  control  efficiently 
the  flow  of  jobs  and  job  traffic  in  a  network  of  computers.   Input  of 
jobs  to  each  center  is  controlled  by  predetermined  information  based 
on  probabilities  and  stored  in  table  form.  These  probabilities  are 
developed  mathematically,  predicated  on  the  fact  that  we  consider  the 
input  rate  to  be  a  random  variable  capable  of  assuming  any  size.  The 
table  is  then  extended  to  handle  the  dispatching  of  jobs  that  must  be 

rerouted  between  different  centers  in  the  network  and  an  efficient 
controller  is  thus  developed. 
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1.      INTRODUCTION 

"For  -which  of  you,  wishing  to  build  a  tower,  does  not  sit 
down  first  and  calculate  the  outlays  that  are  necessary,  whether  he 
has  the  means  to  complete  it?  Lest,  after  he  has  laid  the  foundation 
and  is  not  able  to  finish,  all  who  behold  begin  to  mock  him,  saying, 
'This  man  began  to  build  and  was  not  able  to  finish!'" 

St.  Luke  lU:28-29 

The  Objectives 

Two  thousand  years  ago  the  importance  of  scheduling  was 
recognized.  Having  enough  material  to  start  and  finish  a  job  while  not 
having  too  much  unused  upon  completion  introduced  the  art  of  resource 
allocation.  Since  then,  much  attention  has  been  paid  to  the  area  of 
schedules  and  to  the  algorithms  which  alleviate  or  at  least  somewhat 
subdue  the  problems  associated  with  scheduling  jobs,  or  tasks,  in  a 
particular  environment.  The  amount  of  attention  is  due  to  the  fact  that 
an  ever  increasing  complexity  in  the  type  and  number  of  jobs  demanded 
by  a  mechanically  expanding  society  necessitates  their  fast  and  efficient 
completion.   One  environment  which  by  its  very  nature  demands  painfully 
high  levels  of  efficiency  and  the  one  towards  which  we  turn  our  attention 
is  the  area  of  network  computers . 


The  term  network  computers  can  mean  different  things  to 
different  people.   It  could  be  the  connection  of  two  or  more  processors 
at  a  single  computer  installation.  Or  it  could  mean  the  connection 
by  telephone  wires  of  many  geographically  distant  and  distinct  single 
machines.  We  will  combine  both  of  these  ideas  in  our  definition.  We 
will  use  the  term  network  computer  to  mean  the  connection  by  some 
communication  facility  of  geographically  distant  and  distinct  computing 
centers,  each  of  which  has  a  number  of  processors  and  peripheral  devices. 

In  recent  years,  schedulers  have  turned  their  attention  to  the 
area  of  network  computers  in  an  attempt  to  more  efficiently  use  the 
tremendous  computing  power  of  such  a  system.  Efforts,  on  their  part, 
have  generated  many  priority  assignment  rules  and  scheduling  algorithms 
effecting  fast  and  efficient  throughput  of  jobs  at  a  particular  center. 
They  have  had  a  tendency,  though,  to  concentrate  on  just  one  center, 
forgetting,  perhaps,  that  this  is  only  a  small  part  of  the  entire  picture 
While  we  will  concede  the  importance  of  this  work,  we  feel  that  some 
scheduling  problems  as  they  relate  to  the  entire  network  deserve  some 
attention . 

One  area  which  has  been  all  but  neglected,  is  the  one  of  load 
leveling  or  load  regulation  for  the  entire  network.   It  is  not  unrea- 
sonable to  expect  that  all  jobs  once  accepted  at  a  center  will  be  run 
to  completion  at  that  center.  It  may  happen,  however,  that  because  of 
mis -management  at  or  failure  of  a  center,  some  jobs  already  accepted 
into  the  network  cannot  be  run  at  the  center  intended.  Should  the  user 


3 
then  be  called  and  told  that  his  job  will  be  delayed  or  not  run  at  all? 
We  think  not.  Management  would  frown  if  every  effort  was  not  made  to 
meet  our  obligations.  We  would  like  then  to  have  these  type  of  jobs 
run  at  another  center  in  our  network  so  as  still  to  meet  the  deadlines. 
We  would,  however,  still  like  to  enjoy  the  benefits  of  making  some  profit 
even  though  through  our  own  fault  it  will  be  reduced. 

Towards  this  goal  then  we  develop  in  Chapter  2.  a  load  regulation 
scheme  meeting  our  objectives,  the  most  important  of  which  is  to  minimize 
as  much  as  possible,  the  chances  of  having  this  overloaded  condition. 
Upon  discovering  that  an  overloaded  condition  exists  at  one  center,  we 
must  choose  another  center  that  is  capable  of  running  the  excess  jobs. 
This  is  the  object  of  Chapter  3- 


2.   THE  LOAD  REGULATOR 


2.1.  Description  of  the  Network  and  Load  Regulator 

We  speak  of  load  regulation  in  a  computer  network  as  the 
scheduling  and  routing  of  jobs  so  that  available  resources  are  used 
efficiently  to  implement  fast  completion  of  all  jobs.   In  a  multi- 
processor system  such  as  we  deal  with,  each  processor  should  be  loaded  all 
the  time  to  be  used  to  best  advantage.  A  processor  sitting  idle  while 
another  has  a  full  load  or  is  even  overloaded  not  only  seems  sub-optimal 
but  also  gross  mismanagement  of  an  expensive  asset,  machine  time. 
Jobs  originally  scheduled  to  be  run  on  a  machine  now  overloaded  should 
be  rerouted.   We  could  ask,  at  this  point,  what  happens  to  incoming 

jobs  when  the  entire  system  is  already  running  at  full  capacity,  but  we 

i 

will  leave  this  as  a  management  consideration.   Our  concern  will  be  with 
the  efficiency  of  our  full  system.  The  tool  that  we  will  use  in  this 
controlling  function  we  will  dub,  oddly  enough,  the  load  regulator. 
We  wish,  then,  to  construct  a  feasible  and  workable  load 
regulator  for  the  entire  network.  A  regulator  that  will  periodically 
check  each  of  the  nodes  in  our  system  and  immediately  be  able  to  determine 
if  that  particular  center  is  in  danger  of  being  overloaded  or  is,  in  fact, 
already  overloaded.  After  this  status  check,  we  would  like  our  load 
regulator  to  take  the  proper  action,  if  necessary,  to  alleviate  the 
impending  or  present  congestion.  We  will  allow  our  regulator  to  make 
the  decision  to  either  inhibit  or  reduce  further  input  to  the  center  or 
to  let  the  input  flow  continue  at  the  present  rate.  We  will  not,  at  this 


point,  force  our  regulator  to  be  concerned  with  the  rerouting  of  jobs 
whose  entry  it  has  refused. 

Our  network  will  consist  of  n  computing  centers  with  a  varying 
number  of  processors  and  peripheral  equipment.  Each  processor  in  a 
center  draws  its  work  from  a  common  queue  relevant  only  to  that  center 
(i.e.  an  idle  processor  in  another  center  may  not  draw  work  from  the 
queue  of  an  overloaded  center  but  rather  must  wait  for  it  to  be  rerouted 
to  it).  We  will  assume  that  the  input  to  the  queue  of  a  particular  center 
is  a  Poisson  process  with  an  unknown  random  mean  X.      Due  to  the  randomness, 
we  will  allow  X   to  range  from  zero  to  infinity;  despite  the  infinite 
capabilities  of  the  input  rate,  we  never  really  expect  an  infinite  number 
of  jobs  to  be  present  at  one  time.   Our  load  regulation  will  consist, 
then,  of  periodically  sampling  this  random  variable  queue  size  and  based 
on  predetermined  information,  stored  in  table  form,  make  the  proper  decision 

Again,  since  our  queue  size  is  a  random  variable,  the  information 
stored  in  the  decision  table  must  be  based  on  some  estimate,  as  accurately 
as  possible.  Toward  this  goal,  then,  we  will  develop  analytical  tools 
based  on  the  probabilities  of  the  queues  reaching  critical  threshold 
levels  and  then  bound  these  probabilities  by  some  arbitrary  criterion  e 
which  will  be  small. 


We  "begin  by  characterizing  our  computer  network  of  n  centers 
by  the  following: 

(1)  Assume  that  the  input  rate  to  any  center  is  a 
Poisson  process  -with  unknown  random  means  X, 

(2)  Assume  that  the  service  rate  in  each  center  is  an 
exponential  random  variable  with  an  average  value  of  ju  •  Then  the 
service  time  is  l/ju, 

(3)  Assume  that  each  center  has  a  finite  storage  capacity  c, 
(h)     The  queue  size  of  each  center  is  sampled  at  discrete 

instants  of  time  K  5,  K  =  0,  1,  2,  . . . 

(5)  A  delay  time  A  is  associated  with  the  load  regulator, 
where  A  is  the  time  elapsed  between  the  issuing  of  a  control  order  and 
its  implementation, 

(6)  Each  center  has  a  common  queue  of  work  relevant 
only  to  that  center, 

f7)   Let  qft)  be  the  number  of  jobs  in  the  queue  at  time  t 
then  the  criterion  of  estimation  is  to  keep 

Prob  [queue  size  at  time  t  >  c]  <  e  for  all  t  >  0  where  e 
is  a  given  real  number  rather  than  zero. 

For  ease  of  computation,  we  will  assume  that  each  node  is 
of  equal  stature  (i.e.  each  center  in  the  network  has  approximately 
equal  computing  power,  equal  speed,  and  on  the  average  equal  work  loads). 
Therefore,  we  can  speak  of  the  relationship  between  K,   11,    c,  and  A  as 
being  the  same  for  each  center.  Since  computing  centers  and  their  affect 
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on  the  entire  network,  unless  identical  should  not  "be  considered  so, 
this  equating  of  the  X,    \i,    c,  and  A  may  seem  like  an  unrealistic 
approach.   Intuitively  it  is,  until  we  realize  that  a  center  far  "below 
the  capacity,  speed,  and  overall  computing  power  of  the  others,  would 
have  a  smaller  queue  and,  therefore,  accept  fewer  jobs,  jobs  that  were 
shorter,  and  jobs  that  required  less  sophisticated  computing  power. 
In  this  sense,  then,  the  X,    \x,    c,  and  A  of  a  large  center  in  the  network 
would  be  relatively  and  proportionally  equal  to  those  of  the  smaller 
center.   Since  we  are  speaking  of  probabilities  relative  to  each 
separate  center,  this  approach  suits  our  purposes.  For  each  center, 
then,  to  use  just  the  one  decision  table  in  the  load  regulator,  it  is 
just  a  matter  of  a  normalization  factor  with  respect  to  the  different 
centers.   To  emphatically  dissolve  the  problem  of  a  consistent  A, 
we  will  assume  that  the  load  regulator  occupies  roughly  the  geographic 
center  of  our  network  and,  therefore,  the  communication  delay  times 
are  equal.  Our  network,  therefore,  will  roughly  assume  the  shape  of  a 
ring  (see  Figure  2.1.). 

2.2.   The  Probability  With  a  Known  X 

Before  proceeding  with  the  development  of  the  probabilities 
for  our  system,  we  may  do  well  to  look  at  the  probabilities  for  a  similar 
system  when  X   is  known.   Predicating  our  system  on  an  unknown  X   as  will 
be  shown  in  section  2.3-   necessitates  using  complicated  mathematics 
to  solve  long  and  messy  equations.  Morse  [l]  shows  us  that  these  messy 
equations  are  not  needed  if  the  input  rate  is  defined  or  can  be 
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Figure  2.1, 


approximated  closely  enough  to  suit  the  purpose .  When  X  is  known, 
the  probability  of  the  queue  at  a  center  being  overloaded  can  be 
defined  by: 

Let  P  =  the  probability  of  n  units  in  the  system.   Then, 


1  -  *At 


1  -  (\/u) 


c+1 


(VV)' 


n=l,2,  . . .,  c 


where  N  is  the  capacity  of  the  queue.   In  our  case,  the  probability 
of  the  queue  being  overloaded  is  just  the  probability  that  there  are 
c+1  in  the  queue.   Therefore, 


c+1 


1  -  Vm 


1  -  (Vm) 


c+2 


(Vm) 


c+1 


which  will  simplify  to 


c+1 


xc+1(m-x) 

c+2   " c+2 
U  -A 


We  will  see  that  this  is  much  simpler  to  work  with  as  we  now  move  on  to 
describe  the  probabilities  when  X  is  unknown. 
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2.3.  The  Probabilities 

We  begin  our  analysis  by  developing  the  probabilities  which 
we  will  need.   The  following  approach  to  the  derivations  is  due  mainly 
to  Bailey  [2]  and  Saaty  [3] •  Assume  that  initially  q(0)  =  0  and  that 
our  first  discrete  sampling  time  (K  5,  K=0, 1, ...)  which  will  have 
any  meaning  is  K=l.  Therefore,  our  first  sample  is  taken  at  time  1  •  8 
or  8  and  the  size  of  the  sample  is  m.  We  wish  then  to  define  this 
probability  that  there  are  m  jobs  in  the  queue  at  time  8  given  that 
there  were  none  in  the,  queue  at  time  0.  We  can  state  this  formally  as, 

Prob  [q(8)  =  m  |  q(0)  =  0]   . 

Remembering  that  the  input  to  each  center  is  a  Poisson  process  with 
unknown  random  mean  X   and  that  the  service  time  is  an  exponential  random 
variable  with  an  average  service  time  of  l/ju  we  define  the  following: 

Prob  [l  arrival  in  time  At]  =  XAt  (l) 

Prob  [>1  arrivals  in  time  At]  =0  (2) 

Prob  [l  service  completion  in  time  At]  =  juAt  (3) 

Prob  [>1  service  completion  in  time  At]  =  0  (h) 

where  At  is  a  very  small  time  interval  and  (2)  and  (k)    tend  to  zero  in 

the  limit  as  At  ^  0  since  the  actual  probabilities  are  0^((At)  )  (Read: 

2 
On  the  order  of  (At)  )  which  is  too  small  to  be  of  any  significance. 
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If  we  let 

P  (t)  =  Pro"b  [of  m  jobs  in  the  system  at  time  t] 
m 

then 

P  (t  +  At)  =  Prob  [of  m  in  the  system  at  time  t  and  during 
the  interval  t  to  At  no  jobs  arrived  and  no  jobs  were  serviced]  + 
Prob  [of  m-1  in  the  system  at  time  t  and  during  the  interval  t  to  At 
one  job  arrived]  +  Prob  [of  m+1  jobs  in  the  system  at  time  t  and  during 
the  interval  t  to  At  one  job  was  serviced] . 

Therefore, 

P  (t+At)  =  [l-(\-Hi)At]  P  't)  +  P  ,(t)  AAt  + 
m'  '     m       m-1 

Pm+1(t)  juAt  +  ^(At)2   ,    m  >  1  (5) 


and 


PQ(t  +  At)  =  PQ(t)  (l-*At)  +  P1(t)  ]uAt  +  e^(At)2         (6) 


from  the  first  equation  (5) 


P  (t+At)  =  P  (t)    -  (\-hu)  At  P  (t)  +  P  .ft)  XAt  +  P  ,  (t)  ,uAt 
m         m      \     fi  m       m-1  m+1 

P  (t+At)  -  P  (t)  =  -  (X+u)   At  P  (t)  +  P  ,(t)  XAt  +  P  ,,(t)  juAt 
m         m  '  m      m-1  m+1 

P  (t+At)-P  (t) 
m         m 

=  -  (\+w)Pm(t)+Pwi  .  (t)M-P^  (t)/i,  m  >  1 


14_    P  (t+At)  -  P  (t)  dp  (t) 

llm       m  m         D«  f+\  m 

At-0         At        _  F  m[Z)   ~   "3t 
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Therefore, 

dp  (t) 
m 

~3t 


(\+u)   P  (t)  +  P  '(t)  \  +   P  lV(t)-jLi,  m  >  1     (7) 

">  ^ i     m^  '    m-lv  '  m+1     '   — 


and  for 


we  set 


PQ(t-fAt)  =  PQ(t)  (1-XAt)  +  P1(t)  juAt 


SPQ(t) 

1TE—  =  "  x  po^  +  M  pi^ 


for  P  (t)  the  corresponding  probability  generating  function  is 


m 


n(z,t)  =  E  p  (t)  zm  .  (9) 

m=0 

Multiplying  (7)  and  (8)  by  z    and  z,  respectively,  summing  overall 
values  of  m  and  then  using  (9),  shows  that  H(z,t)  satisfies  the  partial 
differential  equation 

z  ^^  =  (i-z)  {(m-MO  n  (z,t)  -  yp0(t)J  .        (io) 

We  will  continue  the  derivation  under  the  assumption  that  initially  there 
are  i  jobs  in  the  system  we  are  looking  at  where  i  >  0. 

If  initially  we  have  t  =  0  and  a  total  of  i  jobs  in  the  system, 
we  have  the  initial  condition 

n(z,0)  =  z1  (11) 


13 


using  the  Laplace  transform  with  respect  to  time 


00 

0*(s)  =  /  e"st  0(t)dt  ,  R(s)  >0  (12) 


0 


and  using  the  inverse,   we  have 

C+ioo 


2«i    J 

C-loo 


:i3) 


Applying  (12)  to  (13)  and  using  (11)  gives 


z1+1  -  w(l-z)P  *(s) 


II*(z,s)  =  r= r, r-^r (1*0 

sz  -  (l-z)(ju-Xz) 

It  follows  from  the  definition  of  the  Laplace  transform  of 
Il(z,t)  that  II*(z,s)  must  converge  somewhere  within  the  unit  circle 
|z|  =  1,  provided  R(s)  >  0.   Thus  in  this  region,  zeros  of  both  the 
numerator  and  the  denominator  on  the  right-hand  side  of  (lU)  must 
coincide.  The  zeros  of  the  denominator  o^.(s)  are  obtained  from  the 
equation 

o^Cs)  =  J(Mtu+s)  +  [x+u+s)2   -  l+Xju]1/2}- /2\  K  =  1,2  .      (15) 

By  Rouche's  theorem  the  denominator  [sz-(l-z) (u-\z)]  has  only  one  zero 
in  the  unit  circle  and  it  can  be  seen  from  (15)  that  |ap(s)|  <  |a, (s)|. 
Thus  the  numerator  on  the  right-hand  side  of  (ik)   must  vanish  when  z  =  ap(s) 


Hence, 


P  *(■)  =  Si+1/U(l  -  a0)] 


Ik 


and  (ik)   can  "be  written  as 


n*(z,s) 


zi+1  -  [(1-Z)  api+1/(l-«J] 


-  x   (z-a1)  (z-au) 


multiplying  the  numerator  by  (1-cO  and  factoring,  we  get 


(16) 


: 


II*(z,s)  =  (z-a2)  (z1  +  a2  z1+1  +  . ..  0£g) 


-  za2(z-a2)  (z1_  +  o;2  z1"'"  +  ...  a  1_  ) 
x  a^z-a^)  (l-z/a^)  (i-a2) 


-1  K 

and  since  (l-z/cu  )   =  Z  (z/a  )   and  adding  and  subtracting 

K=0 

i+1 
OU   ,   we  get 


II*(z,s)  =  —  (z^z1"  + 


ad)  Z  (z/a  ) 
K=0 


K 


OL. 


i+1 


+ 4^?  K?0  (z/ai)K 


P*  (s)  is  the  coefficient  of  z  in  the  expansion;  hence  for  m  >  i 


P*(s)  =  ± 
mv  '        X 


1 


+ 


mA  ,  [bM     i 


a/"14-1   a/-1-1"3   of^"145 


+ 


a 


(nA)1 

m+i+l 


+  (^)m+1    z      MJ 


K 


1 


K=m+i+2  a 


K 
1   _J 
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now  the  inverse 


C+loo 

P   (t)   =  JL     /  eSt  P*'s)ds 

c-ipo 


and  we  get, 


-(\+M)t 


p   (t)   =  ^~T 
m  X 


,m-i+l 


(4i~/~uf~±+   (m-i+l)t"1   -I      ._    (2s/?it) 


+     u/\   (\/I7Ii)m_1+3(m-i+3)t"     •  I   -i+3(2^It)   + 


...  +  UA)1  (>/xA0 m+1+1(m+i+i)  i^Ca^t 


m+i+1 ' 


>m+l 


+   (\A0"r*       ^  (>/^A)K  Kt"1  I   (2>Ct] 

K=m+i+2 


where  I  ...  is  a  modified  Bessel  function  of  the  first  type  and 
substituting 


2v 


I  (z)   =  I   ,(z)  -  I  nfz) 

z    v   y    v-1       v+1 


and  simplifying,  we  get 


P  ft)  =  e' 
m 


(~X+u)e 


,r~/-  si-m+1 


uA)1_m  I   .(2^t) 
'       m-i 

'2\^t)  + 


TV 

(1  -  \/M)  (X/U)m       Z     (^\)K  (2^t) 
K=m+i42 


(IT' 
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■which  is  the  equation  we  sought  and  finally  substituting  0  for  i 
(initially  we  have  zero)  and  8  for  our  time  we  get 


Prob  [q(&)  =  m  |  q(0)  =  0] 

-  (Mm)  5  U^Jx)~m  im(2^  8) 


e 


+   (^)      Vl^^  6)  +  (1  -  X/U)    (X/U)1 


,K 


ij\u  8) y 


Z   W\rL(2^8)  (18) 

K=m+2  *       J 


The  above  probability  (18)  defines  our  chances  of  finding 
m  jobs  in  the  queue  at  any  of  the  centers  in  our  network  given  that 
there  were  none  to  start  with.  We  have  shown,  however,  that  q(0)  =  0 
is  not  a  necessary  restriction  and  that  we  may  start  with  any  number  in 
the  system. 

We  now  know  the  probability  of  finding  m  jobs  in  the  queue 
at  time  5.  What  is  even  more  important  for  us  to  know,  at  this  point, 
is  whether  or  not  m  is  greater  than  c  (i.e.  are  there  more  jobs  in  the 
queue  than  the  system  can  handle) .   If  m  >  c  we  would  want  to  either 
prohibit  further  input  of  jobs  to  that  node  or  at  least  to  reduce  it 
until  the  overloaded  condition  was  no  longer  present.  We  are,  therefore, 
interested  in  knowing  the  probability  upon  finding  m  jobs  in  the  queue 
at  time  8,  that  this  m  exceeds  the  capacity  for  the  center.  Formally, 
we  want 


Prob  [q(t)  >  c  |  q(8)  =  m] 
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Since  X   is  a  random  variable  it  is  allowed  to  vary  over  the  range 

\e(0,<x>)   and  even  the  best  means,  to  estimate  X   will  often  be  grossly 

in  error.  Rather  than  depend  on  this  somewhat  unstable  statistic,  an 

average  value  of  the  quantity  under  consideration,  Prob  [q(t)  >  c], 

will  be  derived,  and  this  estimate  will  minimize,  though  by  no  means 

correct,  the  error  inherent  in  this  calculation.  Let  P  /c.   «\(X)  be 

q(o,m,0) 

the  normalized  Prob  [q(S)  =  m  /  q(0)  =  0]  such  that 


0 


P  ,_   _v(\)  ax  =  1 

q(P.,m,0)v 


then  from  the  previous  probability  derivation  with  substitution  it  should 
be  clear  that  for  t  >  o,  we  have 

Prob  [qCt)  >  c  |  q(g)  =  m] 

n_P  J        q(fi,m,0)v 
1  0 

.     [(^/x)m-n  in_m   [a^(VB)] 
+  (^A)m"n+1  in+m+1  [S^(t-B)] 

00  \ 

+  (i-x/u)   (x/ii)n        Z  (Jl/xf   I  [2^u(t-fS)](  dX      (19) 

K=n+m+2  K  J 

We  might  note  that  1  -  Prob  [qCt)  >  c  '    q(fi)  =  m]  is  the  probability 
that  the  queue  size  is  within  efficient  limits. 
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2.k.     Calculation  of  the  a   Change  Factor 


Before  continuing  with  the  probabilities  we  will  look  at  some 
general  considerations  that  must  be  taken  into  account.  Remembering 
that  the  load  regulator  has  a  decision  delay  time  of  A,  we  note  that  no 
change  in  the  input  rate  (either  a  blocking  or  reducing  change)  can  be 
achieved  before  A  +  5.  This  is  because  our  first  sampling  time  of  any 
importance  is  K8,  where  K=l.  Thus  if  the  probability  in  equation  (19) 
exceeds  e  at  any  time  in  the  range  (5,A+8),  the  load  regulator  will  not 
change  X   in  time  to  correct  it.  Furthermore,  it  should  be  noted  that  if 
no  change  is  ordered  at  time  8,  then  the  earliest  possible  time  for 
effecting  a  change  would  be  A+25  (A  time  units  later  than  the  next 
observation) .  Therefore,  the  present  input  rate  X   should  be  maintained 
if  and  only  if  Prob  [q(t)  >c  I  q(;6)  =m]  <  e  for  all  t  in  the  range 
^5,A+2o).   If  a  change  is  necessary  it  should  be  made  so  as  to  guarantee 
that  the  criterion  e  is  met  only  for  t  in  the  range  (5,A+2o),  since  at 
time  25,  we  may  order  another  change  if  needed.  We  conclude,  therefore, 
that  we  wish  to  order  a  change  in  the  input  rate  X   if  and  only  if 

Prob  [q(t)  >  c   q(&)  =  m]  >  e,  for  t  e(A+8,  A+2o)   . 


If  by  chance 


Prob  [q(t)  >  c  !  q(&)  =  m]  >  e,  for  t  e(o,  A+S) 


which  could  happen,  then  our  criterion  will  be  violated. 
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If  a  change  is  necessary  and  ordered,  we  will  then  change 

X   to  OX,    for  a=0,  . .  .,1,  where  Cfc=l  means  no  change,  Q!=0  means  a  complete 

halt  to  all  input  to  that  center,  and  a   in  "between  is  some  reduction 

factor.  Assume  that  Max  Prob  [q(t)  >  c   q(S)  =  m],  for  t  in  the  range 

(A+5,  A+25),  occurs  at  time  t=t_  and  that  we  need  a  change  (i.e.  the 

probability  is  greater  than  e).  We,  then,  compute  the  change  factor  a 

in  the  following  way: 
For  q(0)  =  0 


Let  0  =  t   -  (A  +  6),  then  Prob  [q(t_)  >  c   q(&)  =  m,  X 


-Q    ^  ,  „„  „«„  ***,„  ^\-0^  - 


was 


changed  to  a  X   at  A  +  6]  =  E   /   E  Prob  [q(A+5)  =  m   |  q(o)  =  m]  • 

n=c  "n  m  =0 

Prob  [q(tQ)  =  n  |  q(A+6)  =  rr^]  P  (6   0)  M    dX  (20) 


from  this  we  get 


00  . 

n=c  -   mn=0  L         L  mi-m 


+ 


0  "T 


, m-m  +1  " 

(n/mA)    X        I  ^  ,  (2s^uA) 
m.+m+l 


m     oo  >, 

+  (l-\/M)  (\/U)   l         E      (^)K  I„  (2nT^a)^  • 


K=m+m  +2  K        J 
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f    -(oMm)  0 
1 


m,  -n 


(^)   X       I      v      (2n^  0) 


n-m 


. m  -n+1  

+  (n/aiM)   x  in4.      ,.    (av/aXju  0) 


"n+m  +1 


00  __  — -i      "\ 

+   (1   -  9£)      (^)n  E  (-^A.)K  IK    (2^  0)       ^ 

M  M         K=n-kn.+2  *  J  J 


P    ,_        M(\)    d\  =   e  (21) 

q(5,m,0)\ 


the  derivation  being  similar  to  what  we  did  before . 

For  q(0)  *  0 

Here  we  wish  to  make  our  decision  at  some  time  K8,  with  some 
number  of  jobs  i  already  present  in  the  center  when  we  checked  at  time 
[K-l]  5.  Let  q(K&)  =  m,  and  let  q([K-l]o)  =  i.  Then  if  no  change  was 
ordered  at  time  [K-1J8  from  (18)  we  have  that 


Prob  [q(K8)  =  m  |  q([K-l]5)  =  i] 

=  e-(^)8  |  (^)1-m  Im_.  (2^5) 

+  (^A)1"^1  Im+i+1  (2^8) 


+  (1-A./M)  (\/n)m         S    (^)K  L.  (2^8)}  (22) 


If  a  change  was  ordered,  at  time  [K-l]  5  we  have 


Prob  [q(KS)  =  m  ]  q([K-l]  8)  =  i] 
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I       I  e-{X+tl) 


m  =0   ^ 


1-1IL 


(>/mA)   1   Tm  _i  (2>^A) 

mm  J- 


i-m 


+  (>/J7\)    -1  I   _±  (2^) 


+  (i-Vm)  (Vm)1"1   Z    (^)k  I*  (2n^A)^ 


K=m1+i+2 


■K 


|e-(a(K-X).+M)  (5-A)   f^jg^V  Im   (2^I),W(5.A) 


Q!(K-1)\n  /a(K-l)\vm 


E    (^/a(K-i)\r  I  (2s/a(K-l)\jLi(8-A) 
K=m+m1+2 


I 
J 


(23) 


where  A  <  o.  Again  because  of  the  random  X   we  will  minimize  the  error 
in  computing  Prob  [q(t)  >  c]  by  normalizing. 


Let  P  ,.„    .\(>0  be  the  normalized  Prob  [q(K8)=m|g( [K-l]&)=i] 
q(K8,m,i)v  '  ^   '   |BV      ' 


such  that 


p  (V*        .s  (>,)  d\  =  1 
q(K8,m,  l) 


then  for  t  >  KB,  we  have 

Prob  [q(t)  >  c  !  q(K8)  =  m,  q([K-l]s)  =  i] 
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00      ,-> 


=     z        ,   ,.  ;x,    ^(Mu)    (t-KB) 


n=c  q 


q(K6,m,i) 


•  {  (^)m-n  In_m  [2^  (t-K5)] 
+  (^)m"n+1  In+jn+1  [2^  (t-K5)] 

OO  "I 

+  (l  -  \/n)    (\/u)n      E     C^A)K  \  1&6*   (t-KB)]  [■  d\ 

K^n+m+2  R.  J 

as  before  we  wish  to  change  (X   if  and  only  if 

Prob  [q(t)  >  c  |  q(K8)  =  m,  q([K-l]s)  =  i]  >  e 

for  t  in  the  range  (KS  +  A,  [K +l]  8+ A)  occurs  at  t  =  tQ.   Now  let 
0  =  t  -  (KS+A) .   Then 

00 

oo     r,   oo    r       .        N         m-m, 

2    /    S   je-^)A  [(V^A.)   1Im^  (2-^A) 

-c   '••  .  m,=0   ^  1 


n=c   0  ...r 


MW^I^l^A)] 


HI-,       oo 

+  (l  -  \/u)  (\/n)   x    I 


T 


K=m+m  +2 


uA)   IK  (2^A)  ( 


sKiA)  t  • 


{e-(a^^[(^M)Vnin.m   (2^ 


0) 


m  -  n+1 
+  (■«       In+mi+1  (2^  0) 

+  (1  -  a\/n)  (a\/V)n    Z    (^/^)K  IK  (2n5^  0)]  [ 

K=n+m  +2  J 


K=n+m  +2 
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q(K8,m,  i) 


[25) 


We  can  see  that  solving  (21),  (23),  and  (25)  for  Q!  is  at  best  a  tedious 
and  complicated  operation.  We  can  console  ourselves  "by  the  fact  that 
the  OL   change  factors  need  only  be  computed  once  and  then  stored  away  in 
the  table  of  the  load  regulator;  after  this  it  is  just  a  referencing 
operation .   It  could  also  be  noted,  at  this  point,  though  it  should  be 
obvious,  that  for  each  value  of  m  observed  at  K5  in  the  range  (0,  c)  a 
decision  with  regard  to  the  input  rate  \   can  be  computed  from  the  previous 
decision  value  of  i  observed  at  [K-l]  B.  We  can  then  generate  our  table 
in  the  following  form: 


2k 


q(K-l) 

q(K-l) 

a  Change  Factor 

0 

0 

a(o) 

1 

a(D 

C 

a(c) 

C 

0 

a(o) 

c 

a(C) 
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For  purposes  of  illustration,  a  ta"ble  "was  generated  according 
to  the  following  criterion: 

(1)  Storage  capacity,  c  =  5 

(2)  Average  service  time  =  5  units  of  time,  therefore,  ju  =  0.2 

(3)  Queue  scan  at  KS,  K  =  0,1,2, ...  with  K  =  .05  units  of  time 
(if)   Delay  time  A  =  .0^5  units  of  time 

(5)   Keep  the  probability  that  the  queue  size  exceeds  c  "bounded 
below  e,  where  e  =  .01. 

The  first  column  of  the  table,  q(K),  is  the  observed  sample  at 
time  K,  the  second  column  of  the  table,  q(K-l),  is  the  observed  sample 
at  (K-l) .   The  third  column  of  the  table  is  the  required  input  multi- 
plicative change  factor  a. 

We  can  see  from  looking  at  the  table  that  with  the  parameters 
chosen  as  they  were  that  the  load  regulation  schemes  decision  seems  highly 
dependent  on  the  present  observed  queue  size.  This  is  obviously  the  case 
since,  with  only  a  few  exceptions,  the  first  half  of  the  possible  queue 
samples  0,1,  and  2  require  an  alpha  change  of  1  (i.e.  no  change),  while 
the  last  half  3,^,    and  5  required  prohibition  of  all  further  inputs. 
Upon  expanding  this  table  for  bigger  queues  we  will  find  that  this  is 
also  the  case.  We  can  reasonably  expect  that  when  the  queue  is  empty 
to  approximately  half  full  that  there  will  be  no  change,  around  half  full 
to  suffer  some  reduction  in  input,  and  from  approximately  half  full  to 
full  to  completely  prohibit  the  input  rate  X. 
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q(K) 

q(K-l) 

a  Change  Factor 

0 

one 

1 

one 

2 

one 

0 

3 

one 

k 

one 

5 

one 

0 

one 

1 

one 

2 

one 

1 

3 

one 

k 

one 

5 

one 

0 

•  5U 

l 

.87 

2 

one 

2 

3 

one 

U 

one 

5 

one 

i 
< 

0 

zero 

1 

zero 

2 

zero 

3 

3 

zero 

if 

zero 

5 

•  32 

i 
j 

0 

zero 

l 

zero 

2 

zero 

k 

3 

zero 

k 

zero 

5 

zero 

0 

zero 

1 

zero 

2 

zero 

5 

3 

zero 

! 

1+ 

zero 

1 
1 

: 

5 

zero 

C 

= 

5 

M 

- 

0.2 

e 

= 

.01 

A 

= 

.0^5 

8 

z= 

•  05 

Table  2.1. 
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We  can  also  see  that  for  servicing  an  entire  network  of 

multi-processing  computer  centers  the  size  of  the  table  (i.e.  the 
amount  of  memory  storage  in  the  load  regulator)  is  not  at  all  excessive. 

With  the  addition  of  the  zero  capacity  possibility,  our  table  has 

2 
(c  +  l)   entries.  With  the  relative  symmetry  of  one  and  zero  around 

the  middle  of  the  table,  it  is  not  unreasonable  to  expect  that  even 

this  figure  could  be  reduced.  We  may  reduce  it  in  the  following  manner 

using  the  data  from  Table  2.1. 

We  will  let  an  entry  of  the  form  A,B  where  A  and  B  are  real 

numbers  define  a  range  from  the  first  q(K-l)  value  to  the  last  for  the 

same  q(K)  that  require  the  same  a   change  factor.   For  example  in  Table 

2.1.  for  q(K)  equal  to  zero  we  have  for  q(K-l)  from  zero  to  five  an  a 

change  factor  of  one  (no  change).   Then  for  q(K)  equal  to  zero  the  table 

entry  of  q(K-l)  will  be  0,5-   The  entire  Table  2.1.  could  then  be 

decreased  in  size  to 


q(K) 

q(K-l) 

a   Change  Factor 

0 

0,5 

one 

1 

0,5 

one 

2 

0,0 

1,1 

2,5 

•  5U 

•87 
one 

3 

0tk 

5,5 

zero 
•  32 

k 

0,5 

zero 

5 

0,5 

zero 

Table  2.2 
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A  total  of  nine  entries  now  define  the  entire  Table  2.1.,  a  seventy-five 
percent  reduction.  A  simple  hashing  algorithm  can  now  he  applied  to  find 
the  correct  q(K-l)  range.  While  we  do  not  expect  this  kind  of  reduction 
in  all  tables  that  would  be  generated  we  would  expect  some.   In  fact, 
all  we  can  say  is  that  the  number  of  entries  in  the  new  table  is  bounded 

by  (c  +  l)  and  (c  +  l)  .   Our  intuition,  however,  tells  us  that  the 

2 
number  will  be  far  less  than  (c  +  l)  .  With  these  ideas  in  mind,  we 

will  now  move  on  to  discuss  what  happens  to  the  jobs  that  are  refused 

at  some  center  by  our  load  regulation  scheme. 
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3-   THE  DISPATCHER 

3.1.  An  Introduction  to  the  Problem 

In  speaking  of  our  load  regulation  scheme,  we  touched  lightly 
on  some  of  the  problems  inherent  in  a  working  network  computer.  We  also 
paid  lip  service  to  the  fact  that  there  are  some  problems  which  must  be 
solved  by  management  and  not  by  our  scheme  or  any  other  (i.e.  what  happens 
to  jobs  when  the  entire  system  is  full  and  no  center  can  accept  it) .   One 
problem  which  we  chose  to  lay  aside  before,  but  which  now  requires  our 

attention,  is  the  rerouting  of  jobs  from  a  full  or  an  overloaded  center 
to  one  less  busy.  We  will  handle  this  by  incorporating  into  our  load 
regulation  table  other  decision  making  material  adequate  to  efficiently 
handle  this  redistribution  of  the  work  load.  We  will  then  call  our 
scheme  the  load-regulator-dispatcher. 

We  note  at  this  point  that  we  could  have  made  things  considerably 
easier  on  ourselves  in  the  beginning.   If  all  jobs  were  initially  routed 
to  and  dispatched  from  the  load  regulator  instead  of  a  common  queue  at 
each  center,  our  problem  would  already  be  solved.  The  load  regulator 
would  know  at  each  discrete  sampling  time  the  sizes  of  all  the  queues 
in  the  network  and  would  route  new  jobs  to  centers  it  knew  could  handle 
them.  But  this  would  have  been  a  costly  if  not  an  unrealistic  approach; 
the  time  and  the  money  wasted  to  send  a  job,  possibly  hundreds  of  miles, 
before  it  is  even  started,  negate  all  the  advantages  of  this  idea.  It 
would  also  restrict  the  type  of  jobs  we  would  accept  as  the  cost  of 
transmitting  and  running  and  re-transmitting  a  short,  fast  job  may  out- 
weigh the  reward  we  would  receive  from  it.  We  will,  therefore,  use.  the 
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load-regulator-dispatcher  to  reroute  only  the  jobs  whose  entry  was 
refused  at  one  of  the  centers.  We  begin  by  discussing  the  communication 
links  that  hold  our  network  together. 

3-2.  The  Communication  Links 

In  constructing  our  network  (Figure  2.1.),  we  omitted 
description  of  the  communication  links  connecting  the  centers  in  our 
network.  We  did  this  because  the  question  of  inter-center  communications 
was  not  relevant  to  our  discussion  of  load  regulation.   It  sufficed  to 
say,  at  the  time,  that  jobs  refused  entry  at  one  center  would  have  to  be 
rerouted  to  another  center.   Since  it  is  our  intention  in  this  chapter 
to  advocate  a  workable  dispatcher  to  handle  this  rerouting  problem,  we 
next  consider  the  question  of  communication  channels  between  our  centers. 
For  purposes  of  clarification  and  completeness,  we  will  talk  about 
three  possible  communication  configurations,  the  last  of  which  we  choose 
for  our  network. 

The  simplest  way  of  forming  a  communications  network  is  to 
provide  each  center  with  a  communications  line  connecting  it  with  every 
other  center.  Figure  3-1*  shows  our  system  if  this  approach  is  used. 
Assuming  that  each  communications  link  is  bidirectional  (i.e.  a  link 
from  center  1  to  center  3  implies  a  link  from  center  3  to  center  l) f 
a  network  with  n  centers  has  n(n-l)/2  links.  This  configuration  is 
optimal  from  a  communications  standpoint  as  it  allows  one  center  to 
communicate  directly  with  all  others  but,  unfortunately,  it  is  only 
practical  for  a  network  with  a  very  small  number  of  centers.  The  cost  of 
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Figure  3.1. 


the  many  lines  when  n  is  large  is  prohibitive  and  forces  us  to  seek  a 
less  optimal  but  more  economical  solution. 

Our  obvious  line  of  attach,  therefore,  is  to  eliminate  as  many- 
connections  as  possible  while  still  maintaining  efficient  communications 
in  the  network.   The  minimum  number  of  communication  links  that  will 
allow  for  our  network  to  function  (i.e.  one  center  has  the  capability 
to  communicate  with  all  others,  though  not  necessarily  by  direct  means) 
is  achieved  by  one  bidirectional  line  between  geographically  adjacent 
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centers.  This  situation  is  portrayed  in  Figure  3*2 


Figure  3«2. 


If  a  job  is  refused  in  a  network  that  is  connected  in  this 
manner,  it  cannot  always  he  directly  transmitted  to  the  center  that  has 

accepted  it,  if  any.   The  i —  center  has  direct  links  with  only  the 

st  st 

(i-l)  J   and  the  (i+l)  '   centers.   It  is  necessary  if  a  job  is  to  travel 

from  center  i  to  center  j  where  center  j  is  not  connected  to  center  i  by 

a  bidirectional  communication  line  for  that  job  to  travel  through  the 

intermediary  center  or  centers.  When  transmitting  a  job,  our  wish  is  to 

minimize  the  number  of  centers  that  we  have  to  pass  through  to  reach  a 

particular  accepting  center.  We,  therefore,  define  the  following. 
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If  we  let  negative  traffic  flow  be  the  transmission  of  a 
job  clockwise  in  Figure  3-2.  and  positive  traffic  flow  be  the  transmission 
of  a  job  counterclockwise  in  Figure  3-2.  and  state  that  a  job  to  be 
rerouted  from  a  refusing  center  i  to  an  accepting  center  j  may  travel 
in  either  the  positive  or  negative  direction  so  as  to  minimize  the 
number  of  transmissions,  then  center  j  can  be  reached  in  not  more  than 
n/2  transmissions  according  to  the  following  rules  'see  Table  3 •!•)*• 


For: 


Direction  of  Job  Travel 


j  >  i  and  -i=i  <  1/2 

n     ' 

Positive 

j  >  i  and  ^  >  1/2 
°                      n  —  ' 

Negative 

j  <  i  and  - —  <  0 
°         n 

Negative 

j  <  i  and  -1  <  ^  <  1/2 

o                                                      n        / 

Positive 

TABLE  3.1. 

This  scheme  applies  itself  well  and  efficiently  'only  n 
transmission  lines)  when  n  is  a  small  number.  When  n  gets  larger, 
the  cost  to  transmit  a  job  the  maximum  number  of  times  (n/2)  may  be 
more  than  the  worth  or  the  reward  of  the  job  itself. 


^Centers  are  numbered  arbitrarily  but  consecutively  l,2,...,n  in  a 
ring  formation  with  the  n^  node  connected  to  the  first  node.   If  the 
centers  are  numbered  in  some  other  manner,  these  rules  do  not  hold. 
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We  also  encounter  a  question  of  reliability  in  this  configuration. 
We  want  our  network  to  "be  such  that  we  can  depend  on  it .  If  one  node 
connector  fails  in  this  scheme  it  may  drastically  hinder  the  running 
of  the  entire  network.  Since  the  sending  of  refused  jobs  depends  so 
critically  on  finding  the  minimum  transmission  path,  we  cannot  tolerate 
a  breakdown  in  the  communications  between  any  two  centers.   To  drastically 
illustrate  this  point,  picture  the  situation  where  the  node  between  the 
refuser  and  acceptor  fails .   Instead  of  two  transmissions  using  the 
fallen  node,  we  now  must  use  n-2  transmissions  to  reach  the  acceptor 
at  a  much  greater  cost.  And  if  two  centers  failed  at  the  same  time, 
unless  they  were  adjacent,  it  would  mean  a  complete  isolation  of  at  least 
one  center  from  the  rest.  Therefore,  we  will  turn  our  attention  to  a 
scheme  that  uses  more  transmission  lines  to  effect  faster  and  more  reliable 
communications  between  the  geographically  distant  centers.  The  scheme 
we  deem  feasible  and  propose  to  use  is  the  connection  by  one  bidirectional 
line,  of  a  center  with  two  adjacent  centers,  and  in  turn,  their  adjacent 
centers  (see  Figure  3-3-)* 
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Figure  3«3- 
As  can  easily  be  seen,  we  have  added  only  n  more  lines  for 
a  total  in  this  configuration  of  2n.   This  may  not  seem  like  an  addition 
significant  enough  to  increase  our  efficiency  but  we  have  proposed  this 
design  for  three  reasons: 

(1)  Reaching  the  j —  center  farthest  from  center  i  can 
be  achieved  twice  as  fast . 

(2)  We  are  talking  about  computer  centers  with  mult i -processor 
capacities,  not  about  inexpensive  equipment.  Each  center  is  a  large 
financial  investment  and,  therefore,  the  number  of  centers  is  bounded 
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by  the  availability  of  funds .  We  feel  a  network  of  sixteen  centers 
to  be  sufficiently  large  for  any  purposes.   Our  scheme  will  work  well 
with  this  number  of  centers. 

(3)   Should  one  center  fail  there  is  still  a  communication 
line  maintaining  a  possible  minimum  path  in  the  direction  from  the 
sender  through  the  failed  node,  to  the  receiver. 

In  the  light  of  this  scheme,  we  find  that  the  maximum  number  of 
transmissions  from  center  i  to  center  j  is  [rt/lfj*  if  n  is  not  exactly 
divisible  by  k   and  n/k   otherwise.   It  may  be  noted  that  the  rules  for 
governing  the  positive  or  negative  flow  of  a  job  to  minimize  the  number 
of  transmissions  are  the  same  as  in  Table  3»1«  We  feel  that  this 
transmission  linkage  is  adequate  to  handle  the  inter-center  communications 
we  desire.  We  will  then  discuss  some  general  considerations  and  then 
describe  an  algorithm  to  be  used  for  the  rerouting. 

3.3.  General  Considerations  for  Rerouting 


Before  formally  stating  the  rerouting  algorithm,  we  shall 
consider  the  reasons  why  the  job  was  refused  entry  to  a  center  in  the 
first  place .  We  will  also  discuss  under  what  conditions  a  center  will 
accept  work  that  another  has  turned  down.   It  may  happen,  that  a  job  is 
refused  at  one  center  and  no  other  center  can  accept  it  at  the  time;  is 
the  job  then  lost  to  the  system  entirely  or  should  it  be  resubmitted  at  a 
later  time?  We  will  now  focus  our  attention  on  these  and  other  questions 
of  interest. 


*Read:   the  ceiling  of  n/k   and  meaning  the  next  integer  higher  than  n/k. 
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From  Chapter  1.  we  see  that  the  load  regulation  scheme  used 
refuses  entry  to  the  incoming  job  solely  on  the  basis  of  the  queue  size 
probabilities  developed  there.  We  feel,  at  this  point,  that  a  more 
thorough  discussion  of  the  idea  of  critical  queue  size  is  in  order.  It 
is  implied,  though  not  intended,  that  the  decision  to  accept  or  reject 
work  rests  strictly  on  the  number  of  jobs  that  are  held  in  the  queue, 
and  when  the  queue  reaches  the  critical  number,  the  load  regulator  would 
inhibit  further  inputs  to  the  center.  While  the  queue  size  is  one  of 
the  factors  in  determining  whether  or  not  to  accept  work,  it  is  not  the  only 
one.  Another  factor  in  the  decision  is  the  expected  execution  time  of 
the  jobs  waiting  in  the  queues.  If  this  expected  execution  time  is  large, 
then  incoming  jobs  joining  this  queue  may  have  to  wait  for  a  long  time. 
It  may  be  reasonable  under  these  circumstances  to  assume  that  the  center 
has  reached  its  saturation  point  and  to  inhibit  any  further  input  to  it. 
In  this  light,  we  would  inhibit  input  even  though  the  number  of  jobs  is 
less  than  the  maximum  queue  size. 

We,  therefore,  observe  that  the  decision  to  accept  work  is  a 
function  of  the  execution  time  as  well  as  the  number  in  the  queue.  We 
must  remember,  however,  that  the  critical  queue  size  in  our  equations  is 
dependent  only  on  the  number  of  jobs.  We  must,  therefore,  express  the 
queue  size  in  terms  of  the  number  of  equivalent  average  jobs,  rather 
than  just  the  number  of  jobs  actually  in  the  queue  so  that  our  load 
regulator  will  still  work  effectively.  We  do  this  in  Appendix  A.   In 
summary,  then,  we  have  the  equivalent  number  of  average  jobs,  ENJ, 
given  by 
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ENJ  =  TET./CRIT.  •  CAP. 
l'     1      l 


where 


th 
CAP.  =  the  maximum  queue  capacity  of  the  i —  center 

TET.  =  the  total  execution  time  for  the  i —  center 

l 

CRIT. =  the  critical  execution  time  of  the  i —  center 
l 


Another  reason  for  refusal  of  a  job  is  that  the  initial  center 
is  just  not  equipped  to  handle  it.   In  this  case,  the  job  must  be  re- 
routed to  a  center  capable  of  running  it.  For  ease  of  understanding 
in  the  formation  of  the  rerouting  algorithm,  we  will  assume  that  this 
does  not  happen.  Any  job  entering  the  network  is  capable  of  being  run 
at  the  center  where  it  is  input  initially  or  at  any  other  in  the  system. 
Again  we  will  leave  it  to  management  to  decide  what  happens  to  these 
types  of  jobs  that  cannot  be  run  at  some  center. 

With  the  discussion  of  the  refusing  center  completed,  we  can 
talk  about  the  center,  if  any,  that  will  receive  the  job.   It  would  be 
expedient,  at  this  point,  to  say  that  the  circumstances  governing  a  center 
accepting  a  job  from  another  center  are  exactly  opposite  from  the  reasons 
the  first  center  refused  it  (i.e.  it  has  room  in  the  queue  and  plenty  of 
time  in  which  to  run  it).  But  it  would  also  be  incomplete.  The  biggest 
factor  in  the  acceptance  of  a  job  another  center  refused  is  the  cost  in- 
volved.  In  particular,  the  receiver  must  be  able  to  ascertain  if  the 
network  will  still  profit  despite  the  cost  of  transmission  and 
retransmission  of  results  between  the  refuser  and  the  acceptor.   If  one 
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center  refuses  a  job  and  no  other  center  can  accept  it,  one  of  two 
things  can  happen: 

(1)  the  refusing  center  can  try  to  make  room  for  the  job 

"by  sending  one  or  more  jobs  to  another  center  (at  a  profit,  of  course), 

(2)  or  the  job  is  lost  to  the  network  and  must  be  forgotten 
or  re-submitted  at  a  later  time. 

Money,  here  as  in  most  areas  of  any  interest,  is  the  governing 
force.  With  these  considerations  in  mind  we  will  allow  the  user  to  place 
a  time  estimate  on  his  job,  and  to  assign  to  it  a  priority  number  between 
1  and  some  x  (the  higher  the  number  the  higher  the  priority),  used  to 
position  his  job  in  the  queue.  We  assume  here  that  an  incoming  job 
with  some  priority  p,  which  causes  an  overload  and  forces  the  load 
regulator  to  inhibit  further  inputs,  will  not  displace  jobs  of  priority 
less  than  p  unless  it  cannot  be  run  elsewhere.   Now,  we  will  finally 
consider  the  algorithm. 

3.k.      The  Algorithm 
Let 
CAP.,  i  =  1,2, ...,n  =  the  queue  capacity  at  the  i —  center 

AT.,   i  =  1,2,... ,n  =  the  average  execution  time  of  jobs 
that  enter  the  i —  center 

CRIT.,  i=  1,2,  ...,n  =  the  critical  execution  time  of  the  i — 

center  (CAP.  •  AT.  ) 
i     l 

T  .,  I  =  1,2,  .  .  .,k,  i  =  1,2,  . .  .,n  =  the  expected  execution 
times  of  the  k  entries  in  the  i —  centers  queue 

th 
TET.  =  the  total  expected  execution  time  for  the  i— -  center 


ko 


th 
M.        =  the  present  size  of  the  i —  queue 
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REW,,.     =  the  reward  for  doing  the  I —  job  at  the  i — 

center  (here  reward  is  defined  to  be  the  profit 
for  running  this  job.   If  a  job  is  transmitted 
to  another  center,  the  reward  is  decreased  by 
the  cost  of  transmitting  the  job  and  getting 
back  the  results) 

I  =     the  position  in  the  queue  of  the  I —  job,  where 

the  priority  of  the  t —  job  is  greater  than  or 
equal  to  the  priority  of  the  (l  +  l)  job  and 
jobs  are  run  in  order  1,2,  . . .,  t,  t+1,  « •  •  ,&. 

1  th 

w.  (  Z  rew  )/m.  =  weight  in  importance  of  the  i — 

1  £=1    ^   X 

m. 

l 

center  where  Z  rew..  is  in  the  range 

(m.  •  1,  m.  •  x)  and  w.  is  in  the  range  (l,x) 

(Remember  that  the  user  specifies  the  worth  of  his  job  by  giving  it  a 
number  between  1  and  x.  We  can,  therefore,  with  the  w.  get  an  estimate 
of  the  kind  of  work  a  center  is  doing;  a  center  with  w.  =  2  would  not 
seem  to  be  doing  useful  work,  while  a  center  with  w.  »  x/2  would  be 
doing  very  useful  work.  We  will  assume  that  the  x  is  the  same  for  all 
centers.  Heuristically  it  seems  that  a  w.  =  x/2  for  all  i, i=l,2, ...,n 
would  be  the  best  for  the  network  since  a  center  with  w.  »  x/2  is  doing 
the  more  important  work  and  is  a  bigger  risk  to  the  network  if  it  should 
fail.   If  jobs  are  distributed  equally  in  importance  then  the  risk  is 
about  the  same  for  all  centers  in  the  network.  We  mention  this  here  but 
we  will  not  attempt  to  level  the  work  loads  so  that  the  w. 's  are 
relatively  equal) . 


kl 


ET.  , 
job 


job 

PROF  .  . 
job 

C.  . 
1J 


TIME 


=  the  estimated  time  to  run  the  refused  job 
=  the  priority  of  the  job  (between  1  and  x) 

=  the  profit  on  the  job 

th 
=  the  cost  to  transmit  a  job  from  the  i — 

center  to  some  center  j  (this  cost  is  computed 

by  using  the  shortest  path  from  center  i  to 

center  j  according  to  the  communication  paths 

described  in  section  3«1») 

=  the  total  estimated  time  to  process  all  jobs 
already  in  center  j,  plus  the  estimated  time  to 
run  the  refused  job  (TET.  +  ET 


J     job 
then,  the  algorithm  proceeds  as  follows;  for  a  job  I   that  must  be 


rerouted  from  center  i  to  center  j 


STEP  I: 


STEP  II: 


obtain  the  w.,  i=l,2, ...,n  from  the  table 


choose  the  center  j  not  yet  considered  with  the  smallest 


remaining  w.  and  compute 


TIME .  =  TET .  +  ET .  , 
J      J     job 


If  no  more  w.,  GO  TO  STEP  IV 


STEP  III 


If  TIME.   >CRIT.  RETURN  TO  STEP  II   otherwise   compute 


PROF  .  .    =  REW , .    -  C .  . 
job  Vl  ij 


k2 

STEP  IV: 

If  PROF  .  t_  <  0  RETURN  TO  STEP  II  otherwise  insert  the 
job  - 

th 
job  into  the  queue  of  this  j —  center  in  the  following  way:  Given 

that  the  last  job  already  in  the  queue  that  has  a  priority  equal  to 

P    is  found  in  the  q —  position,  then  insert  the  job  in  the  (q+l) 

position  and  displace  the  lower  priority  jobs  if  any  by 

—        — 

ET  .  ,  /AT  .    +  1  positions  in  the  queue  then  STOP 
job'   JJ 

STEP  V: 

If  the  job  has  a  higher  priority  than  some  jobs  in  the  queue, 

th 

insert  the  job  into  the  i —  center  as  in  STEP  IV  and  then  GO  TO  STEP  II 

to  re-route  a  job  or  jobs  that  had  to  be  displaced. 

Generally,  this  algorithm  tends  to  choose  the  center  that  is  doing  less 
useful  work  than  the  others  to  hopefully  increase  the  w.  for  this  center 
and  make  it  more  useful  to  the  network. 


^Performed  in  integer  arithmetic  because  we  still  want  our  queue  size 
in  terms  of  time  if  necessary. 
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3-5«   The  Load-Regulator -Dispatcher 

We  have  thus  far  discussed  the  communications  scheme  for 
our  network,  some  of  the  reasons  why  a  job  might  "be  refused  and  need 
these  communication  paths,  and  finally  the  algorithm  "by  which  we 
accomplished  the  rerouting.   in  order  to  achieve  our  goal  of  a  workable 
load-regulator-dispatcher  we  need  to  finish  construction  of  the  decision 

table .  We  built  the  load  regulator  part  of  the  table  in  Chapter  1  with 

2 

(c+l)   entries;  we  need  now  to  add  the  dispatcher  part  to  the  table. 

The  following  information  is  necessary  for  the  algorithm  and  can 
be  divided  into  two  categories;  information  stored  in  the  load-regulator- 
dispatcher  permanently  and  information  local  to  each  center  that  comes 
to  the  load-regulator-dispatcher  as  parameters  of  the  refused  job: 

LOCAL  INFORMATION 

A. 

(1)   ET.  .       (2)   P.  .         (3)   REW0. 
;    job        ;    job        VJ/     ti 

these  are  job  parameters  carried  by  each  refused  job. 

B. 

(1)   CAP.        (2)   AT.         (3)   T^ 

local  statistics  used  for  updating  of  information  stored  in  load- 
regulator-di  spatcher . 


kk 

L0AD-REGU1AT0R-DISPATCHER  INFORMATION 

(1)   TET.         (2)   CRIT.  (3)  W. 

(k)     C. .  (5)  M. 

information  used  for  algorithm  and  updated  as  the  characteristics  of 
a  center  change.  Our  load-regulator-dispatcher  table  then  takes  on  the 

following  form  ( see  Table  3  «2  . )  • 

2 

The  number  of  entries  in  the  table  is  now  2(c+l)  +  kc,    a 

number  which  is  well  within  the  realm  of  feasibility,  especially  when 
all  centers  with  the  use  of  a  normalizing  factor  can  use  the  same  table . 
We  also  expect  that  with  the  same  analysis  of  Chapter  1,  we  can  reduce 
this  number.   Our  load-regulation-table,  with  the  addition  of  this 
information,  becomes  a  load-regulation-dispatching  table,  the  generation 
of  which  was  our  goal. 
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k.      CONCLUSION 

In  the  preceding  chapters,  we  discussed  "briefly  the  ideas 
of  load  regulation  and  dispatching  in  a  network  of  computers.  We 
developed  analytical  tools  which  we  used  to  form  a  decision  table  in 
memory,  a  decision  table  that  adequately  handled  the  inhibition  or 
reduction  of  input  to  any  center  in  our  network.  We  then  extended  our 
load-regulator  into  a  load-regulator-dispatcher.  We,  therefore,  have 
the  tool  for  control  of  our  network  that  we  sought  to  find.  The  only 
limitation  to  this  scheme  would  be  one  of  memory  space,  and  how  much 
of  this  precious  resource  management  is  willing  to  lay  aside  for  this 
purpose .  We  have  shown  that  the  memory  needs  for  the  load-regulator- 
dispatcher  to  work  were  not  excessive  in  terms  of  the  job  that  it  does 
for  us . 

It  could  be  noted,  at  this  point,  that  the  load  regulation  part 
of  our  table  could  be  used  in  a  single  center  for  control  of  jobs  on 
the  different  processors  and  future  research  may  look  at  this.  The 
algorithms  that  do  the  dispatching  in  our  network  could  also  be  sophisti- 
cated to  a  degree  that  management  would  have  a  hard  time  finding  fault 
with  these  ideas  for  control.  We  have,  in  this  paper,  then,  razed  the 
wall  that  stood  in  the  way  of  using  this  type  of  controller.   It  now 
remains  to  only  sidestep  or  climb  over  the  rubble  and  debris  left  behind 
to  reach  a  usable  load-regulator-dispatcher. 


^7 


APPENDIX 


We  wish  to  express  the  queue  size  in  terms  of  the  number  of 
equivalent  average  jobs  in  queue  so  that  our  load  regulator  will  still 
work  efficiently  if  we  let 


m. 


CAP. 

1 

AT. 

1 


th 
=  the  actual  number  of  jobs  in  the  i —  centers  queue 

th 
the  capacity  of  the  queue  of  the  i —  center 

the  average  processing  time  of  jobs  that  are  run 
at  the  i —  center  (a  figure  gathered  over  a 


representative  period  of  time) 

T„.,  -6=1, 2,. ...m  =  the  expected  execution  times  of  the  m  entrie: 
li  ,,   .th 

in  the  l —  queue 


TET. 
l 

CRIT. 


ENJ 


=  total  execution  time  for  the  i —  center 

th 
the  critical  execution  time  of  the  i —  center 

=  the  equivalent  number  of  average  jobs  in  the  queue 


then, 


CRIT.        =     CAP.    •   AT.    (a  reasonable   upper  bound) 
l  li 


and 


TET.  =     Z     T,.    +  AT. 


(A-l) 


where  we  add  AT.  to  represent  the  execution  time  of  jobs  already  in 

th 
process  at  the  i —  center  and  finally  the  equivalent  queue  size  is  the 

number  of  equivalent  average  jobs  (ENJ)  is  given  by 


ENJ 


TET. /CRIT.  •  CAP. 
1'  l      l 


(A-2) 


kQ 


We  can  also  see  at  this  point  that  if 


TET./CRIT.    >  1 


th 
we  should  inhibit  further  inputs  to  the  i —  center. 


EXAMPLE 


Let 


CAP.  =  100 

l 

AT.   =  50  units  of  time 


then  CRIT.  =  CAP.  -  AT.  =  5,000  units  of  time;  given  that  the  following 
ill-  to 

m  jobs  (m=20)  are  in  the  queue 


Job  No.  Units  of  Expected  Execution  Time,  T . . 

1  150 

2  25 

3  ^0 
k  90 

5  110 

6  30 

7  10 

8  75 

9  50 

10  175 

11  k5 

12  55 

13  20 
Ik  90 

15  200 

16  70 

17  60 

18  100 

19  220 

20  80 


^9 
then  applying  (A-l)  we  find 

TET.  =  1695  +  50  =  17^5 

and  (A-2)  we  get 

ENJ  =  17^5/5000  ■  100  =  35 

this  implies  that  for  this  situation  when  our  queue  size  is  polled 
at  a  discrete  sampling  time  KS,  for  some  K,  the  equivalent  queue  size 
sent  back  to  the  load  regulator  should  he  representative  of  the  number 
of  average  jobs  (35)  and  not  the  actual  figure  (20) . 
We  should  also  note  that  if 

AT.  *  m  >  TET. 
1—1 

(i.e.  the  expected  total  execution  time  for  m  jobs  is  less  than  or  equal 

to  the  average),  then  the  equivalent  queue  size  sent  back  to  the  load 

regulator  should  be  m.  Therefore,  we  stipulate  that  the  lowest  number 

that  can  represent  the  equivalent  queue  size  is  the  actual  size  of  the 

queue  in  terms  of  the  number  of  jobs  (i.e.  CAP.  >  ENJ  >  m.  )  .  We  do  this 

because  it  is  too  hard  to  change  the  hardware  configuration  of  the  queue 

which  specifies  that  each  job  should  occupy  a  specific  physical  space 

(i.e.  four  words).   It  would,  therefore,  be  unwise  to  attempt  to  cram 

four  or  five  short  jobs  into  the  physical  space  generally  occupied  by 

one  or  two. 
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